首页> 外文OA文献 >Architectural Impact on Performance of In-memoryData Analytics: Apache Spark Case Study
【2h】

Architectural Impact on Performance of In-memoryData Analytics: Apache Spark Case Study

机译:架构对内存中数据分析性能的影响:Apache Spark案例研究

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

While cluster computing frameworks are contin-uously evolving to provide real-time data analysis capabilities,Apache Spark has managed to be at the forefront of big data an-alytics for being a unified framework for both, batch and streamdata processing. However, recent studies on micro-architecturalcharacterization of in-memory data analytics are limited to onlybatch processing workloads. We compare micro-architectural per-formance of batch processing and stream processing workloadsin Apache Spark using hardware performance counters on a dualsocket server. In our evaluation experiments, we have found thatbatch processing are stream processing workloads have similarmicro-architectural characteristics are bounded by the latency offrequent data access to DRAM. For data accesses we have foundthat simultaneous multi-threading is effective in hiding the datalatencies. We have also observed that (i) data locality on NUMAnodes can improve the performance by 10% on average and(ii)disabling next-line L1-D prefetchers can reduce the executiontime by up-to 14% and (iii) multiple small executors can provideup-to 36% speedup over single large executor
机译:尽管集群计算框架不断发展以提供实时数据分析功能,但Apache Spark已成为批处理和流数据处理的统一框架,因此在大数据分析领域处于领先地位。但是,有关内存数据分析的微体系结构表征的最新研究仅限于批处理工作负载。我们使用双插槽服务器上的硬件性能计数器比较Apache Spark中批处理和流处理工作负载的微体系结构性能。在我们的评估实验中,我们发现批处理是具有相似的微体系结构特征的流处理工作负载,并且受对DRAM的延迟延迟数据访问的限制。对于数据访问,我们发现同时多线程可有效隐藏数据延迟。我们还观察到(i)NUMAnodes上的数据局部性可以平均将性能提高10%;(ii)禁用下一行L1-D预取器可以将执行时间减少多达14%;(iii)多个小型执行器可以比单个大型执行器提速达36%

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号